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INTRODUCTION 


Looking at the past history of instructional material 
development it has been found that much initial effort was 
spent in generating material and selecting media. A number 
of decisions were reguired. Many problems such as the 
selection of format, mode of response, reinforcement etc. 
were to be solved during this initial period. If we look at 
seme other fields having the same kind of problems, we see 
that these problems are being solved with the help of 
validated models and there are very few decisions left to be 
made. In the field of education, generally, and in 
instructional education, specially, we have not been able to 
find such models in existence, though much research should 
have been done in this area. If we look at the literature, 
there are indications that people feel the need of such a 
model (Smith 6 Hurry 1975). Murril & Boutwell (1975) have 
commented that mathematical evidence and specific component 
justification of current instructional development methods 
lack in empirical verification. Baker (1973) has even 
suggested that much of the literature in instructional 
development prescribed procedures was based upon faith 
alone. A book edited by Mayer (1975) points out the 
importance of clearcut guiderules in the instructional 
design rules. 

We can see very clearly that there is a fundamental 
problem in the field of instructional education. The absence 
of robust, active, validated models or set of guiderules to 
help the developer determine the best material and 
procedures for the student does and will continue to effect 
our standard of education. 

Presently it would be unfair to say that our 
researchers have not paid any attention to this ever 
existing problem. Quite a few instructional programs have 
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been developed over the years, yet in each case the program 
developer had to create a unique model to answer the design 
questions for each program. Simple basic questions regarding 
the operations of the program had no ready answers available 
which were empirically based or validated. In the absence of 
readily available answers and since there was no method to 
conveniently simulate various outcomes to arrive at the 
answers, each program became an exercise in rediscovery 
through trial and error. As a result the model developed for 
a program became suitable only for that particular program 
and it was not possible to generalize it for other programs. 
This is the situation in which an instructional product 
developer usually finds himself. 

If a model could be developed for instructional 
education, it would give the developer a system and a method 
for testing out and selecting various combinations of the 
product components in order to achieve desired target 
behavior. Components such as accuracy level, length of 
lesson, response rate, etc. could be arranged to result in 
the fastest learning at the least cost. A model like this 
should be specific to the outcomes rather than the content 
so that its basic alogrithms could be applied to many 
different programs. Each program can have a different 
arrangment of components depending upon the required 
outcome. If a model like this existed, it would have 
resulted in the early development of instructional programs 
and their speedy validation. The result would have been a 
tremendous saving of time and cost in the field of 
education. 

In reviewing the general history of instructional 
development it can be seen that the absence of such models 
is one of the most overriding problems in the area of 
instructional education. The obvious problem then is that no 
model exists which has been tested and validated and is 


10 





























generalizable to a variety of instructional products. The 
potential benefits to be derived from even a modest model 
are sufficiently great to place this problem in high 
priority category. The emphasis is being put upon the need 
for validated workable models or guiderules which can assist 
the instructional developer in the construction of teaching 
material and procedures. 

At the Behavioral Sciences Institute, Carmel, 
California, considerable work is being done in this area. 
They have developed some models and are in the process of 
validating them. In an early study Madson (1972) attempted 
to form a model for language learning on the basis of a 
markov chain process. Oertel (1975) showed the nonexistence 
of any etiological factors. The author, in doing this work 
for arithmetic programming, is pursuing the same theory and 
is attempting to produce the guiderules which are so badly 
needed. 
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MODEL DEVELOPMENT 
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BACKGROUND 


Before we go about developing our model it is necessary 
to review the events which started the development of such 
model. Since 1885 when some work was done by Ebbinghaus, 
experimental studies on learning have been recorded and 
reported in quantitative form. The first application of 
mathematics was seen for the purpose of describing empirical 
functions. A learning curve was the most common method of 
reporting results of a learning experiment. A graph 
representing the changes in the performance of a subject or 
group of subjects over successive practice trials for 
particular experimental conditions was the best bet. We have 
seen some of the analytic functions which were proposed to 
be the learning functions. Many arguments heard regarding 
these functions were that none of them was derived from 
fundamental considerations about the nature of learning. All 
of them were good with closest fit to the data usually 
obtained by the function that had more free parameters. 

In 1919 Thurstone set up a system of axioms based on 
psychological considerations that led to the derivation of 
rational learning functions. A very specific set of 
psychological identifications was used as the parameters. 
Moreover Thurstone was the one to suggest a probabilistic 
approach. He took as his aim the derivation of the 
probability of a correct response as a function of trial 
numbers. The same theory was later extended to the analysis 
of discrimination learning and transposition by Gulliksen 
and Wolfle (1938). However, only mean response curves were 
considered and no attention was paid to the prediction of 
response distributions and sequential statistics. Moreover 
no proceedures were devised for parameter estimation and no 
experiments were done to find the validity of the parameters 
of the model. Another group of experimenters attempted to 
derive learning corves from simplified conceptual models of 
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the nervous system but their efforts did not have any 
significant impact on experimental investigation of 
learning. 


The pioneer of theoretical learning was Clark Hull. In 
his major work. Principle of Behavior (1943) , a number of 
postulates were stated which dealt with a number of 
variables that had not been identified in the earlier 
experiments. The postulate in many cases was simply a 
generalization of empirical results. It was hoped that the 
aggregation of postulates would jointly imply much more than 
the specified experimental facts from which they are 
individually derived. Hull aimed for comprehensiveness in 
his theory partially due to its relative clearity and 
generality. The theory stimulated considerable experimental 
research. It has gone through a variety of modifications and 
still guides the research of many contemporary 
experimenters. The most important contribution by Hull was 
the statement of a rich collection of qualitative concepts 
and propositions, some of which have had a lasting influence 
on the thinking of psychologists. 


Somewhat later many other researchers started 
formulating their stochastic models for learning. At the 
same time another group worked in developing what h^s come 
to be known as Linear Models for learning. The basic idea 
for linear models is very simple. In a two-choice learning 
experiment, the probability that the subject will make 
response 1 on trial n is p^. On each trial the subject 
responds and some reinforcing event is provided. If 
reinforcement event j occurs on trial n the new value of 
response probability on trial n+1 is 


P = a: -«■ Lrj 

rufi j ^ j 


this equation expresses the new 
probability as a linear function 
parameters ajand bj specify whether 


value of response 
of its old value. The 
event j effects an 
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increase or decrease in p . 

viyL 

At the same time work was being done on markov chain 
models with fewer states and they represent an especially 
promising line of theoretical development. The basis of 
original development was a paper by Estes (1959). Basic to 
this formulation is the idea that a subject s response 
probability can take on only a fixed set of values and that 
reinforcing events produce transitions from one value of 
response probability to another. 

It has been proposed that performance in the 
experimental situation can be represented by three discrete 
performance levels: o, p,and 1. In these terms learning 
consists of two all-or-none transitions from lower to higher 
levels of response probability. This notion was originated 
by Estes who also introduced the technique of representing 
learning by markov chain. It was because of Estes prior 
theoretical work that we were led to examine our data for 
evidence of an intermediate performance level. In truth we 
have been astonished by the consistency with which such 
evidence has apperared throughout the range of data 
examined. 

It will be noted that the evidence comes from 
experimental situations in which initially the probability 
of a correct response is zero and asymtotically it is unity. 
Such zero to one situations possess an important advantage 
for our method of data analysis. The arrangement enables one 
to identify responses between the first success and last 
failure as occurring in the intermediate state. The 
importance of this identification can be understood if one 
imagines trying to test decisively the notion of a single 
intermediate state for a learning situation in which the 
initial response probability is greater than zero or the 
asymtote is less than unity, or both. In such cases the 
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evidence has to be of a more indirect nature like predicting 
quantitative details of a variety of statistics. We know 
that data showing an intermediate performance level can be 
interpreted within the framework of stimulus sampling 
theory. Facts about intermediate performance level can also 
be interpreted in terms of multistage models of Restle and 
Greeno (1970) In constructing and testing the three-stage 
model, we have suppressed the stimulus sampling rationale 
and have presented simply a descriptive model about 
learning. 

The learning model exploits the notion of an 
intermediate state in an obvious way. Certain general 
markovian properties were imposed regarding transition 
probabilities among the states, and the resulting model 
provided a fairly adequate description of the data on which 
it was tested. The specific form of the model is not 
arbitrary entirely since we had been able to reject various 
plausible alternative three-stage models because one of the 
models we have tested permits a direct, one-trial transition 
from the starting state to the terminal absorbing state. 
This alternative is diagramed in Figure 1 . Here it is 
assumed that with probability (1 - d) the subject skips the 
intermediate p state going directly to state 1. The 
alternative classes of learning models which can be 
considered are the continuous or incremental theories such 
as the linear operator models. Although extensive 
comparisions have not been undertaken, it seems evident that 
all contiuous models will be rejected for this kind of data. 
In particular, from continuous models one would expect 
performance to improve monotonically over trials between the 
first success and last error. Such upward trends simply 
failed to materialize in any of the studies. Our test for 
such trends were the CHI Square and the rank order 
correlation between intermediate trials and response 
probabilities. In none of many cases considered was this 
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Fig 1. 


A Three Stage Model 




















































correlation significantly different from zero, a result in 
line with the stationarity assumption. It might be objected 
that possible effects on the intermediate responses of 
individual differences in learning were not considered. To 
answer this objection experiments were conducted by Bush and 
Hosteller (1955). Two points were made from the results 
observed. One, that the argument of selection artifacts does 
not really rescue the continuous models from the 
stationarity data and two, that the statistical tests we 
routinely use to assess stationarity of intermediate 
responses have considerable power to reject the null 
hypothesis when it is false. 
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MODEL DEV ELOPEMENT 


A brief review of mathematical learning theory by 
Atkinson, Bower, and Crothers (1965) indicates that learning 
as probability models started in 1919. From 1919 to 1950 
there were quite a few models proposed and tested. All of 
them were specific to certain learning situations. From 
1950 onward there has been much work done in the area of 
stochastic learning. This resulted in two theories, the 
linear model and the markov model. The linear model 
basically depends upon the theory that the probability of 
success for a subject is given by the equation 

= 1 - (i-fJO-e).(1) 

where p^^ is initial probability of success and 0 is his 
learning rate. 

The markov model depends upon a different theory which 
states that if a subject is in an unlearned state (u) then 
the probability of a correct response is g (guess). If the 
subject is in the learned state(L), then the probability of 
correct response is 1. the probability of going from the 
unlearned state to the learned state on any presolution 
trial is c. The probability of a correct response on any 
trial n is given by ^ ^ 

P, = 1- . 

a comparision of equations (1) and (2) indicates that their 
forms are exactly the same. The difference in these 
equations lies in their theoretical background and the 
meaning of the parameters. Equation (1) states that a 
subject starts with a probability PjOf making a correct 
response on the first trial. The probability of success on 
the second trial is greater due to incremental learning 
achieved on the first trial. The linear process continues 
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indefinitely and the subject s probability of success 
approaches 1 asymptotically. Equation (2) states that on 
each presolution trial a subject has a probability c of 
going into solution. Once in solution the subject stays in 
solution and always responds correctly and this probability 
remains constant. The form of these two equations are 
compared by Restle and Greeno (1970). Based on their 
analysis it is stated "...the all-or-none theory is most 
interesting and we think it is the one most deserving of 
future work ". 

Pilot research involving a computer simulation of the 
linear model suggested that it is inappropriate for 
mathmatical learning. The study of data from students showed 
that the markov principles of stationarity and independence 
are applicable to this program. Based on these results this 
work was done considering Markovian (all-or-none) principle. 
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ASSUMPTIONS 


For the developement of the model, the following 
assumptions are necessary 

1. The learning process is Markovian in nature 

2. The subject can be correct on the first trial of any 
step by either (a) being in solution prior to the trial, (b) 
going into solution because of the information presented in 
the first stimulus or (c) guessing correctly in presolution. 
This assumption modifies equation (2) in that equation (2) 
contains the restriction that for the subject to be correct 
on the first response, he must guess correctly, therefore it 
does not allow the possibility of being in solution (the 
learned state) on the first trial. Allowing for the 
possibility that the subject is in solution on the trial 
(Atkinson, 1965) appears to be a more realistic approach and 
was used in this work. 

3. The g factor in presolution is a function of step 
and the subject. 

4. The c factor is a function of step and the subject. 

5. g and c are constant over any step for a given 
subject. 

6. The set of outcomes form a homogenous markov chain 





LI 0 


V\= 0,1,2* 
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MODEL 


The equations developed in this work are based on the 
work done by Atkinson, Bower, and Crothers (1965) , Coombs 
(1970), Restle (1970), Gray (1972), and Madson (1972). 
Since it is difficult to give credit to one source, only the 
equations are given with explanations. The first important 
thing is the probability of a correct response given that 
the subject is in an unlearned state (u) . This state is 
assumed on the first trial and known to exist if an error 
occurs before reaching the advancement criterion. If no 
error occurs then there is no way to find out whether the 
subject was in learned state (L) or was in unlearned state 
and performed as follows ^ 

?CC<.».ft6.C,T) = c + 30 -<^)c ,.5\.-OC^.(3) 

- e . 

in the future whenever we refer to this probability we 
shall call it rho, the probability of errorless response 
given that the subject is in the unlearned state. The above 
equation says that either the subject goes into the learned 
state on the first trial, stays in the unlearned state and 
guesses correctly and then goes into learned state, or stays 
in the unlearned state twice, guesses correctly twice and 
then goes into the learned state, etc. The development 
indicates that the subject goes into the learned state 
eventually if errorless response is achieved after an error. 
The reader familiar with markov theory will note that the 
term relating to remaining in the unlearned state and having 
errorless responses was omitted in developing equation (4). 
The omission was committed since the term 

goes to zero in the limit as n approaches infinity. 

The next development will be the expected number of 
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errors given g and c. The probability that the total number 
of errors is k is ^ 

This would represent every feasible combination of 
events in which exactly k errors can occur. By using 
standard mathematical tables we can reduce the equation to 
the following r f \ 

(t^^V . 

(i-QfQ 

In words equation (5) gives the total number of 
response strings required untill the last error and after 
that the subject is in the learned state. 

Since the probability of an errorless response string 
is rho, given that the subject is in an unlearned state# it 
follows that the error response is ( 1 - rho ). This takes 
into account all possible numbers of correct responses 
before the error response which breaks the string. The 
occurence of an error demonstrates the unlearned state and 
also allows for another possible string of errorless 
responses which is independent of the length of previous 
strings and depends only on being in the unlearned state. 

The next developement is the expected trial number of 
last error. The probability that the last error occurred on 
trial t equals 


p (T=0) = rho 

■?[t= t] = (1 -cA'' 3) e. 

t=1# 2, 3, . 

In words equation (6) says that there were t trials in the 
unlearned state indicated by an error on trial t and then 
errorless response. The probability statement allows for any 
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sequence cr number of correct and incorrect responses up to 
trial t. The only required knowledge is that an error 
occurred on trial t and then no more errors. 


To find the expected value of t 

tCT] = f 6 p[T= ^ e ^ t (i-c) 

i-.O 

. 0-3)0-0 e = [eu'5^'^] 


fc-i 




So (T 

eiCTi 

solving by using previous relations 

^ i 
C ^ 


A 

C “ 




so this equation says that c is approximately the inverse of 
the trial number of the last error. This is intuitively 
appealing as it states that the larger the factor c 
(probability of going into solution) the fewer the expected 
number of trials. 
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VERIFICATION OF MODEL 


Subjects 

All subjects from whom data were obtained for this 
analysis were public school students. They attended classes 
for the educationally handicapped in the state of 
Pennsylvania. All were going through the Monterey Arithmetic 
program which was developed by Behavioral Sciences Institute 
in Carmel, California. The number of subjects used in this 
analysis was 48. There were 20 girls and 28 boys. The age 
range was between 5 and 11 years. Their IQ ranged from 60 to 
80. The subjects were randomly selected for analysis by the 
supervisor in Pennsulvania. There was no effort to constrain 
subject selection by age, sex, etiology or any other 
parameter. 
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Data Source 


The subjects were given problems to solve. Depending 
upon what subprogram they were in , they performed addition, 
subtraction, multiplication or division. When a subject 
completed a problem it was checked by a teacher for 
accuracy. Depending upon the outcome it was marked as a 
correct or incorrect response. Thus, for the purposes of 
this study, each problem which was worked was counted as one 
response and each lesson was comprised of a sequential 
string of responses. 

The total number of responses was 3000. For any subject 
the sequence of responses generated in a single lesson 
consisted of two parts. First, a string consisting of 
correct and incorrect responses and second, a string of 10 
continously correct responses. Some of the response strings 
were not used in the analysis. The string of continous 
correct responses indicates a solution state and since we 
were considering only the presolution state, the string of 
continous correct responses was not utilized. There were 
480 responses in this category. The situations where the 
subject started with correct responses and did not make any 
error indicated that the subject was already in the solution 
state. The responses in situations like this were not used. 
The number of responses of this kind was 320. In situations 
where the subject did not complete the lesson, he gave us no 
indication of the number of responses necessary to go into 
solution state. We were also unable to use those responses. 
The number of responses of this type was 1196. After 
disregarding all those responses mentioned above we were 
left with a total of 1004 responses which comprised 48 
strings of correct and incorrect responses ( lessons). Thus 
each subject contributed one response string to the data 
pool. 


26 





Program 


The arithmetic program consists of material and 
procedures which are specially designed for the purpose of 
achieving a high degree of skill and accuracy in the 
computation of arithmetic problems. It is divided into four 
subprograms of addition, subtraction, multiplication, and 
division. Each subprogram consists of 42 steps. These steps 
are in increasing order of difficulty. The first step is 
very basic and the last step is most difficult. k subject 
completing the last step is considered capable of performing 
all the calculations of that subprogram. This program is 
designed to be used in a classroom but it can be 
administered on an individual basis. It is useful for both 
kinds of students, those who did not have any arithmetic 
before and those who had had it but could not achieve the 
required accuracy level. This program is applicable to all 
students of all ages and takes into consideration all kinds 
of differences which occur among them. It uses a locator 
test which helps the teacher to place each student at the 
appropriate location in the program. It also uses an 
automatic branching proceedure which takes care of slow 
learners. This program is built in such a way that the 
teacher can respond equally to both remedial and 
developmental students. 
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ANALYSIS and RESULT 


The raw data consisted of 48 strings of correct and 
incorrect responses. For this analysis values of 0 and 1 
were assigned to correct and incorrect responses, 
respectively. The data are shown in appendix A. As the 
basic characterstics in Markov chain process are 
independence and stationarity and since other aspects of 
performance are closely related to these properties, it was 
decided to test the data for these two characterstics. The 
proceedure for the tests was the same as proposed by Oertel 
(1975) for pooled data. Independence was tested by 
calculating for each subject the observed frequency of the 
four possible combinations (1-1, 1-Or 0-0, 0-1) and then 
computing the value of Chi Square by appropriate formula for 
a 2x2 contingency table (incorporating the correction for 
continuity). Whenever the subjects had cell entry less then 
5, the data were combined with as many adjacent subjects as 
necessary to get a frequency of at least 5, The Chi Square 
values were then summed . The results are shown in Table 1 
and the observed values in appendix 3. The table shows that 
the data has the property of independence. 

For testing stationarity the proportion of correct 
responses in the first and second halves were compared. The 
difference in proportions for each subject was tested by a 
direct difference t test. The results are in Table 2 and it 
establishes the property of stationarity. 

Once the properties of independence and stationarity 
were confirmed, the next step was to find the distribution 
of L (number of responses). To find the distribution a 
histogram was plotted (appendix C) . The distribution 
appeared to be exponential. A Chi square goodness-of-fit 
test was used to test the null hypothesis that the 
distribution was exponential. The test did not reject the 
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null hypothesis. Calculations are shown in appendix D. 
Since the data was discrete, it was decided to test the data 
for having a negative binomial or a geometric distribution. 
A Kolmogorov-Siiiirnoff goodness-of-fit test was done to find 
the distribution. The result of the test are shown in Table 
3, and the linear relationship between the observed and 
generated data is shown in appendix E. From the table we 
can see that the data best fits the Geometric distribution 
with q = 0.96. This gives c the maximum absolute difference 
in comulative distribution function = 0.12 and the 
probability of occurance is 0.7167. The value of alpha for 
the test was 0.1. Once the distribution was confirmed we 
were able to predict the percentage of students in the 
solution state for any given number of responses using the 
cumulative distribution function table shown in appendix F. 
The values of L (number of responses) for different 
percentages are given in table 4. 

The next step was to find the estimated value of the 
parameter c. From our theoretical background we know that c 
is approximately the inverse of the expected number of 
incorrect responses T. To find the expected value of T for 
any given number of responses a regression analysis was 
carried out between T and L. The result was a linear 
equation with a value of r = 0.8673 

L = 4.8T + 3.3 

The expected values of L for any given T are shown in table 

5. Similarly, expected values of T for different L are 
shown in the same table. Hence for any L we were able to 
find the value of T and so the value of C. The values of L, 
T and C fcr different accuracy levels (Q) are given in table 

6 . 


The next step was to find some kind of representation 
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or trend from the number of incorrect responses within the 
first 10, 15 or 20 responses. This was attempted to enable 
us to predict the expected number of responses from a 
subject to reach the solution state and to find a branching 
criterion. The relationship of the density, sequence, and 
patterning of incorrect responses to the total number of 
responses was examined. Unfortunately we were unable to 
find any significant trends or relationships. 
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FIGURE II. Graph of Cumulative Distribution Function 
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Table 1 


Chi-sguare values for independence of transition 

probabilities 


subject 

A 

B 

C 

D 

Chi square 

value 

1-9 

5 

25 

33 

30 

.0027 

10 

10 

8 

9 

30 

.0010 

11-24 

12 

45 

57 

165 

.000044 

25-40 

8 

43 

53 

176 

.00011 

41-48 

5 

23 

29 

116 

.000034 


total .00388 
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Table 2 


Tabulated values of the proportion of correct responses 
in first and second half and the values of a direct 

difference t test 


subject 1st half 


1 

3/3 

2 

5/6 

3 

6/7 

4 

14/18 

5 

16/23 

6 

5/5 

7 

3/4 

8 

3/3 

9 

4/5 

10 

3/5 

11 

10/11 

12 

24/27 

13 

3/8 


2nd half diff 

3/3 0 

5/6 0 

6/7 0 

15/18 1 

15/23 1 

4/5 1 

3/4 0 

2/3 1 

4/5 0 

4/5 1 

9/11 1 

24/27 0 

3/8 0 
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14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 


3 

0 

1 

0 

0 

0 

2 

0 

1 

1 

1 

0 

1 

6 

1 

1 

2 

0 


6/10 

9/10 

5/5 

5/5 

12/13 

11/13 

20/20 

20/20 

2/2 

2/2 

2/3 

2/3 

3/7 

5/7 

7/8 

7/8 

2/2 

1/2 

13/16 

14/16 

2/2 

1/2 

4/5 

4/5 

6/7 

5/7 

19/22 

13/22 

2/4 

3/4 

2/2 

1/2 

7/7 

15/18 

5/7 

15/18 
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32 

5/8 

4/8 

1 

33 

21/29 

18/29 

3 

34 

7/10 

8/10 

1 

35 

14/20 

14/20 

0 

36 

13/17 

12/17 

1 

37 

16/19 

13/19 

3 

38 

5/5 

3/5 

2 

39 

12/15 

10/15 

2 

40 

5/7 

5/7 

0 

41 

5/5 

4/5 

1 

42 

5/6 

5/6 

0 

43 

5/6 

5/6 

0 

44 

4/4 

3/4 

1 

45 

30/34 

30/34 

0 

46 

1/1 

1/1 

0 

47 

16/24 

17/24 

1 

48 

6/7 

6/7 

0 

total 

392/488 

372/488 

20 
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t (observed) = 1.45 


t (critical) = 2.01 


Result; The data had the property of stationarity 
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Table 3 


Kolmogorov-Smirnoff goodness-of-fit test for the number of 
responses (L) to the Negative binomial and Geometric 

distributions 


distribution 

parameter 

c 

P 

negative 

alpha 

=27.36 

0.98 

0.00000 

binomial 

K = 

0.91 



geometric 

g 

= 0.35 

0.54 

0.00000 


g 

= 0.95 

0.14 

0.5487 


g 

= 0. 96 

0.12 

0.7167 * 


g 

= 0.97 

0.20 

0.1786 


g 

= 0.99 

0.52 

0.0000 

; = absolute 

difference in 

c .d. f. 

p = prob. of 
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Table 4 


Tabled values 

of the number of 

responses 

(L) reguir 

:ed for 

given percentage of students to 

be in the 

solution 

state 

at 

a specific level 

of confidence 


in 

confidence level 

80 

(percent) 

90 

95 

99 

solution 

(percent) 

50 

5 

6 

7 

9 

60 

10 

11 

12 

14 

75 

23 

24 

26 

29 

80 

28 

29 

31 

36 

85 

35 

37 

40 

47 

90 

47 

50 

55 

70 

95 

63 

69 

82 

>200 

96 

69 

76 

96 

>200 
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Table 5 


Tabled 

the 

values of the 

total number of 

expected number of errors 
responses (L) given T or 

T 

to L 

L to 

T 

1 

8 

10 

1 

2 

13 

20 

3 

3 

18 

30 

5 

4 

22 

40 

7 

5 

27 

50 

9 

6 

32 

60 

11 

7 

37 

70 

13 

8 

42 

80 

15 

9 

46 

90 

18 

10 

51 

100 

20 


40 




Table 6 


Tabled values of T, C, and L for a given percentage of 
students in solution and a given accuracy level 

percentage in solution 

50 75 80 85 90 95 



t 

c 

t 

c 

t 

c 

t 

c 

t 

c 

t 

c 

.5 

3 

333 

12 

083 

14 

071 

18 

055 

25 

040 

34 

029 

.4 

2 

416 

10 

104 

12 

086 

15 

067 

20 

050 

28 

036 

.3 

2 

555 

7 

139 

9 

115 

11 

090 

15 

067 

21 

048 

.25 

1 

999 

6 

167 

7 

138 

9 

067 

12 

080 

17 

058 

. 2 

1 

999 

5 

208 

6 

172 

7 

135 

10 

100 

14 

072 

.15 

0 

9 99 

4 

277 

4 

230 

6 

180 

7 

133 

10 

097 

. 1 

0 

- 

2 

416 

3 

345 

4 

270 

5 

200 

7 

145 

.05 

0 

- 

1 

999 

1 

690 

2 

540 

2 

400 

3 

290 

1 


5 


22 


27 


34 


45 


60 


Q = (1“P) / probability cf incorrect response 
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DISCUSSION and SUMMARY 


The basic idea behind this work was to develop some 
guidelines to help the designer of the learning program in 
deciding, before the program is run, the required amount of 
work to be performed by the students and the teacher. The 
ability to make this decision validly would be helpful in 
speeding learning and cutting down the costs. For these 
reasons model verification was required. First of all the 
data was observed to see the kind of process that would be 
useful. As we know there are two kinds of models in 
existence, the linear model and the stochastic model. It was 
especially necessary to see whether the data agreed with the 
stochastic model, since there are certain parameters—namely 
L, T, C—which, if determined correctly, would enable us to 
predict values which are very close to observed values. The 
work done by Oertel had shown that this was possible. So 
our main emphasis was to establish first that the data is a 
product of Markov process and then to find these parameters. 

As shown in the analysis, we were able to describe the 
learning process to be a Markov process by testing for 
stationarity and independence. Once these properties were 
established, we were able to use all the assumptions 
mentioned earlier. The distribution, once found, enabled us 
to predict the expected number of responses required for any 
given percentage of students to be in the learned state. 
This would help the designer of the program to determine his 
requirement for the number of problems, depending upon his 
target of achievement. 

The next step was to determine the values of the 
parameters t and c. The linear regression equation helped 
us in predicting the expected number of incorrect responses 
when the total number of responses was known. If the 
designer of the program can determine the number of 
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responses required to be in the solution state, he could 
determine a branching criterion easily. The rule could be 
made that if a subject made more than a specified number of 
incorrect responses, he should be branched. Once the value 
of T was found, it was an easy step to find the value of C. 
These values can be used to calculate different 
probabilities as shown in the theory. 

In the next step we tried to find some kind of 
representation of incorrect responses. This was done in 
order to be able to predict the students to be branched by 
observing the first 10 or 15 responses. This was done by 
different methods such as density, pattern, and frequency. 
Unfortunately we were unable to find any significant trends. 
The reason for not finding the trend could be that there is 
none, but it could also be that we did not have a sufficient 
number of response strings. 

It is suggested that if further work is done in the 
future then the data to be collected should beat least four- 
or fivefold of the present data. If with that data trends 
are still not visible, it will suggest that they donot 
exist, however if a trend is observed, it would be a great 
help to the designer of program for determining the 
branching rule right after the few initial responses. As 
stated this would save much effort and time of both students 
and teachers and would be a major factor in reducing the 
cost of running the program. 
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Appendix - 


Raw Data 
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0 0 0 1 


10000001001 

00100000100001 

000000000100010 

011101010100101 

0 1 1 0 0 0 0 1 

0 0 0 1 

00000000010001 

0 0000100001010 
0 0 1 0 0 0 1 

000011 1001001 10 

00000010000000 1 
111001 100001100 

101100000000101 

0 0000100000000 
1 0 1 0 0 0 0 0 1 0 1 

0 0000000010001 
10 0 10 1 

0 000 0001001100 
0 0 0 1 1 0 1 

0000000101 

010000000010100 

001 100000010101 

000000001 

001 000000001 

0000 100000001 


000010010000000 

1 

00100000011111 1 
0000010000101 

0 0 0 0 0 1 

111111000011000 

0000010111 01010 

000001000000010 

001010010010001 


0 

0 

0 

0 
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0 0 0 0 0 0 1 


0000010001001 

00010000000001 

00001100010000001000000001000 0 
0 0 1 0 0 0 1 

10000000010101000100110001100 0 
00011101000101001 

000000000 1 

0 0 0 1 0 0 0 1 

0 0 0 0 0 1 

1000000001 

1 0 0 0 1 0 0 0 0 1 

0000000001000000001001 

00000001000000000100000100000 0 
0010000000001000000000101 

1 0 0 0 0 0 0 1 

001101010000000010001 

00010000001 

00000001000000010000000001 
0 0 0 0 1 
0 1 0 0 0 1 

10101100100001 
10100001000000001 
0 0 0 1 

0 00010001000100000000010000000 
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0 1 

0 0 0 0 0 0 0 1 

00000000100000000010000010000 0 
0 10000000010000000100000000010 
0 0 0 0 0 0 0 1 

1 

0 0 1 

0 01000010011010000100110000011 
011000000001000011 
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Appendix —B 

Frequencies of (1~1, 1-0, 0-1, 


0-0) sequences 


? (<l ■ ft 


S— tX^asqql 

,f*0 



\ 


& 



fe 


♦ # 







1 

2 

3 

4 

5 

6 

7 

d 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 


2 

6 

8 

21 

3 

2 

10 

23 

5 

30 

11 

21 

16 

23 

6 

12 

6 

7 

8 

9 

5 

7 

10 

24 

20 

8 

4 

4 

7 

5 

16 

41 

5 


frequencies of sequences 


0 

0 

0 

2 

1 

0 

0 

0 

2 

10 

1 

6 

1 

2 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

0 


0 

2 

2 

10 

1 

0 

1 

6 

3 
8 

4 
6 
8 

5 
1 
8 
3 
0 
1 
1 
0 
2 
1 

5 

11 

0 

1 

0 

1 

2 

2 

6 
1 


1 

2 

3 

11 

2 

1 

2 

7 

4 
9 
4 

7 
9 
6 
2 

8 
4 
1 
2 
2 
1 
3 
2 
6 

11 

1 

2 

1 

1 

2 

3 

7 

1 


49 




























































34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 


10 

7 

20 

30 

3 

2 

4 

10 

2 

22 

6 

52 

0 

1 

23 


1 

0 

0 

2 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

5 


4 

1 

2 

12 

0 

1 

4 

3 
0 

4 
0 
7 
0 
0 
9 


5 

2 

3 

12 

1 

2 

4 
3 
1 

5 
1 
8 
0 
1 

10 
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Appendix -^C 


Histogram of the data 
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Appendix —D 


Chi square goodness-of-fit test 
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Chi. Sqr. goodness of fit test 


Ho= the distribution is exponential 

the ditribution is not exponential 
alpha =0.1 


interval 1-exp {-alpha x) theo obs dif 


10 

0.33 

f reg 

16 

freg 

18 

2 

20 

0.55 

10 

9 

1 

30 

0.70 

8 

7 

1 

40 

0.80 

4 

5 

1 

50 

0.865 

4 

4 

0 

60 

0.91 

2 

1 

1 

70 

0. 94 

1 

1 

0 


Chi. Sqr. = 1.1125 
Chi. Sgr. (.05) = 1.64 
df=6 

Result: accept Ho 
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Appendix —E 


Graphical representation of the linear relationship between 

observed and generated data 
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Cumulative distribution function and probability 
distribution function values 
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